AI 가속기

"오늘의AI위키"는 AI 기술로 일관성 있고 체계적인 최신 지식을 제공하는 혁신 플랫폼입니다.
"오늘의AI위키"의 AI를 통해 더욱 풍부하고 폭넓은 지식 경험을 누리세요.

1. 개요
2. 역사
3. 진행 중인 연구
4. 명명법
5. 딥 러닝 프로세서 (DLP)
- 5.1. 디지털 DLP
- 5.2. 하이브리드 DLP
6. 벤치마크
7. 잠재적 응용 분야
참조

1. 개요

AI 가속기는 CPU를 보완하여 특정 작업을 가속화하는 특수 하드웨어 장치로, 딥 러닝 및 인공지능(AI) 워크로드의 증가와 함께 중요성이 커졌다. 초기에는 아날로그 회로 또는 디지털 신호 처리 장치(DSP)를 사용한 시도가 있었으며, 이기종 컴퓨팅, GPU, FPGA, ASIC 등 다양한 형태로 발전해 왔다. GPU는 딥 러닝 작업에 널리 사용되며, FPGA는 유연성을 바탕으로 AI 가속에 활용된다. ASIC은 보다 구체적인 설계를 통해 효율성을 높이며, 구글, 퀄컴, 아마존 등 다양한 기업에서 자체 AI ASIC을 개발하고 있다. 현재 메모리 내 컴퓨팅, 원자 수준의 반도체, 통합 광학 텐서 코어 등 AI 가속기 관련 연구가 활발히 진행 중이며, MLPerf와 같은 벤치마크를 통해 성능을 평가한다. AI 가속기는 산업용 로봇, 기계 번역, 자율 주행 자동차, 의료 분야 등 다양한 분야에 응용될 수 있다.

2. 역사

컴퓨터 시스템은 특수 작업을 위한 코프로세서를 통해 CPU를 보완해 왔다. 주목할 만한 ASIC 확장 카드 하드웨어 장치로는 비디오 카드, 사운드 카드, GPU, DSP 등이 있다. 2010년대에 딥 러닝과 인공 지능 작업 부하가 증가하면서, 이러한 작업을 하드웨어 가속하기 위해 특수 하드웨어 장치가 개발되거나 기존 제품에서 개조되었다.

2. 1. 초기 시도

인텔의 ETANN 80170NX와 같은 초기 시도는 신경 기능을 계산하기 위해 아날로그 회로를 통합했다.^[158] 이후 네스터/인텔 Ni1000과 같은 완전 디지털 칩이 뒤따랐다. 1993년 초 디지털 신호 처리기가 광학 문자 인식 소프트웨어를 가속화하기 위한 신경망 가속기로 사용되었다.^[159] 1988년 웨이장(Wei Zhang) 등은 알파벳 인식을 위한 컨볼루션 신경망의 빠른 광학 구현에 대해 논의했다.^[160]^[161] 1990년대에는 신경망 시뮬레이션을 비롯한 다양한 애플리케이션을 목표로 워크스테이션용 병렬 고처리량 시스템을 만들려는 시도도 있었다.^[162]^[163] 1990년대 FPGA 기반 가속기가 추론과 훈련을 위해 처음 연구되었다.^[164]^[165] 2015년 퀄컴 스냅드래곤 820부터 스마트폰에 AI 가속기가 탑재되기 시작했다.^[166]^[167]

2. 2. 이기종 컴퓨팅

이기종 컴퓨팅은 단일 시스템 또는 단일 칩에 여러 개의 특수 프로세서를 통합하여 각 프로세서를 특정 유형의 작업에 최적화하는 방식이다. 셀 마이크로프로세서^[168]와 같은 아키텍처는 압축된 저정밀도 연산 지원, 데이터 흐름 아키텍처, 대기 시간보다 처리량 우선 순위 지정 등 AI 가속기와 크게 겹치는 기능을 갖추고 있다. 셀 마이크로프로세서는 AI^[169]^[170]^[171]^[172]^[173]^[174]를 비롯한 다양한 작업에 적용되었다.^[28]^[29]^[30]^[31]^[32]^[33]^[34]

2000년대에는 비디오 및 게임 작업 부하로 인해 CPU의 SIMD 장치가 점점 더 넓어졌으며, 압축된 낮은 정밀도 데이터 유형도 지원한다.^[175]^[35]^[109] CPU 성능이 향상됨에 따라 AI 워크로드를 실행하는 데에도 사용된다. CPU는 중소 규모 병렬 처리를 갖춘 DNN, 희소 DNN 및 낮은 배치 크기 시나리오에 적합하다.

2020년대는 AI 엔진의 CPU 칩 탑재 흐름이 일어나고 있다. 애플사의 A 시리즈 및 M 시리즈에 탑재된 Neural Engine^[110]^[111], AMD의 Ryzen AI^[112]^[113], 인텔의 Meteor Lake 이후 통합된 Neural Processing Unit^[114]^[115] (NPU) 등이 있다.

2. 3. GPU 사용

그래픽 처리 장치(GPU)는 원래 이미지 조작 및 처리를 위해 특화된 하드웨어이다. 그러나 신경망과 이미지 조작의 수학적 기반이 유사하고, 행렬 연산과 관련된 병렬 처리에 강점을 가지기 때문에, GPU는 기계 학습, 특히 딥 러닝 분야에서 널리 사용되게 되었다.^[176]^[177]

2012년, 알렉스 크리제프스키는 딥 러닝 네트워크인 AlexNet^[38]을 훈련하기 위해 두 개의 GPU를 사용하여 ISLVRC-2012 대회에서 우승했다. 2010년대 동안, 엔비디아(Nvidia)와 같은 GPU 제조업체는 딥 러닝 관련 기능(예: INT8 연산자)과 소프트웨어(예: cuDNN 라이브러리)를 추가하여 GPU의 AI 가속 능력을 향상시켰다.

GPU는 자율 주행 자동차와 같은 장치에서 훈련 및 추론 모두에 딥 러닝을 지원하며, 엔비디아 NVLink와 같은 기술은 AI 워크로드에 대한 추가 연결 기능을 제공한다. GPU 제조업체들은 AI 가속을 더욱 향상시키기 위해 신경망에 특화된 하드웨어(예: 텐서 코어)를 통합하고 있다.^[41]^[42] 텐서 코어는 신경망 훈련 속도를 높이는 데 목적이 있다.^[42]

GPU는 대규모 AI 애플리케이션에도 지속적으로 사용되고 있다. 예를 들어, 오크리지 국립 연구소^[43]의 IBM 슈퍼컴퓨터 서밋은 27,648개의 엔비디아 테슬라 V100 카드를 탑재하여 딥 러닝 알고리즘을 가속화한다.

2. 4. FPGA 사용

FPGA는 프로그래밍이 가능하여 유연성이 높다는 장점 때문에 AI 가속기 개발에 활용되고 있다.^[178]^[164]^[165]^[179] 재구성 가능한 FPGA는 하드웨어, 프레임워크, 소프트웨어를 상호 간에 발전시키기 더 쉽게 만들어준다.^[44]^[19]^[20]^[45]

마이크로소프트는 딥 러닝 추론을 가속화하기 위해 FPGA 칩을 사용했다.^[180]^[128] 인텔은 알테라를 인수하여 서버 CPU에 FPGA를 통합, 범용적인 작업뿐만 아니라 AI도 가속할 수 있도록 하는 것을 목표로 하고 있다.^[129]

대한민국에서도 FPGA 기반 AI 가속기 연구가 활발히 이루어지고 있으며, 특히 통신, 국방 등 특수 목적의 AI 시스템 개발에 활용되고 있다.

2. 5. 전용 AI 가속기 ASIC의 등장

GPU 및 FPGA는 AI 관련 작업에서 CPU보다 훨씬 뛰어난 성능을 발휘하지만, 주문형 반도체(ASIC)^[181]를 통해 더 구체적인 설계를 하면 최대 10배의 효율성을 얻을 수 있다.^[182]^[183] 이러한 가속기는 계산을 가속화하고 계산 처리량을 높이기 위해 최적화된 메모리 사용 및 낮은 정밀도의 산술 사용과 같은 전략을 사용한다.^[184]^[185] AI 가속에 사용되는 일부 저정밀도 부동 소수점 형식에는 반정밀도 부동 소수점 형식 및 bfloat16 부동 소수점 형식이 있다.^[186]^[187]^[188]^[189]^[190]^[191]^[192] 구글, 퀄컴, 아마존, 애플, 페이스북, AMD, 삼성과 같은 기업들은 모두 자체 AI ASIC을 설계하고 있다.^[193]^[194]^[195]^[196]^[197]^[198] 세레브라스 시스템즈는 딥 러닝 워크로드를 지원하기 위해 업계 최대 프로세서인 2세대 Wafer Scale Engine(WSE-2)을 기반으로 전용 AI 가속기를 구축했다.^[199]^[200]

3. 진행 중인 연구

IBM 연구자들은 2017년 6월, 폰 노이만 구조와 대조되는 인 메모리 컴퓨팅 기반의 아키텍처를 발표했다. 이 아키텍처는 시간적 상관 관계 감지에 적용되는 상변화 메모리 어레이를 활용하며, 이종 컴퓨팅과 대규모 병렬 처리 시스템에 일반화하는 것을 목표로 했다.^[143] 2018년 10월에는 인 메모리 처리를 기반으로 인간 뇌의 시냅스 네트워크를 모델로 한 뉴로모픽 엔지니어링 아키텍처를 발표하여 딥 러닝을 가속화했다.^[144] 이 시스템은 상변화 메모리 어레이를 기반으로 한다.^[145]

3. 1. 메모리 내 컴퓨팅 아키텍처

IBM 연구진은 2017년 6월, 폰 노이만 구조와 대조되는 인 메모리 컴퓨팅 기반의 인 메모리 프로세싱과 시간적 상관 관계 감지에 적용된 상변화 메모리 어레이를 발표하여, 해당 접근 방식을 이종 컴퓨팅 및 대규모 병렬 처리 시스템으로 일반화하려 했다.^[56] 2018년 10월에는 신경 모방 공학을 기반으로 하고 딥 뉴럴 네트워크를 가속화하기 위해 인간 뇌의 시냅스 네트워크를 모델로 한 인 메모리 프로세싱 아키텍처를 발표했다.^[57] 이 시스템은 상변화 메모리 어레이를 기반으로 한다.^[58]

3. 2. 아날로그 저항 메모리를 사용한 메모리 내 컴퓨팅

2019년, 밀라노 공과대학 연구진은 단일 연산을 통해 수십 나노초 만에 선형 방정식 시스템을 해결하는 방법을 발견했다. 이들의 알고리즘은 옴의 법칙과 키르히호프의 법칙을 사용하여 한 단계로 행렬-벡터 곱셈을 수행하여 시간과 에너지 효율성이 높은 아날로그 저항 변화형 메모리를 사용한 인 메모리 컴퓨팅을 기반으로 한다.^[59]^[146] 연구진은 크로스포인트 저항 메모리를 갖춘 피드백 회로가 선형 방정식 시스템, 행렬 고유 벡터 및 미분 방정식을 단 한 단계로 해결할 수 있음을 보여주었다. 이러한 접근 방식은 디지털 알고리즘에 비해 계산 시간을 획기적으로 개선한다.^[59]^[146]

3. 3. 원자 수준으로 얇은 반도체

2020년, 마레가 등은 플로팅 게이트 전계 효과 트랜지스터(FGFET)를 기반으로 하는 메모리 내 로직 장치 및 회로 개발을 위한 대면적 활성 채널 재료에 대한 실험 결과를 발표했다.^[60] 이들은 원자 수준으로 얇은 반도체가 에너지 효율적인 머신 러닝 응용 분야에 유망하며, 동일한 기본 장치 구조가 로직 연산과 데이터 저장 모두에 사용될 수 있다고 보았다. 연구진은 반도성 이황화 몰리브덴과 같은 2차원 재료를 사용하여 FGFET를 정밀하게 조정, 메모리 소자로 로직 연산을 수행할 수 있는 빌딩 블록을 개발했다.^[60] ^[147]

3. 4. 통합 광학 텐서 코어

2021년, J. 펠드만(J. Feldmann) 등은 병렬 합성곱 처리를 위한 집적 광자 하드웨어 가속기를 제안했다.^[61] 저자들은 집적 광자가 전자 방식에 비해 가지는 두 가지 주요 장점으로 (1) 파장 분할 다중화와 주파수 빗살을 통한 대규모 병렬 데이터 전송, (2) 매우 빠른 데이터 변조 속도를 설명했다.^[61] 이 시스템은 초당 수조 번의 곱셈-누산 연산을 수행할 수 있으며, 이는 데이터 집약적인 AI 애플리케이션에서 광자 집적 회로의 잠재력을 보여준다.^[61] 인공 신경망의 역전파를 수행할 수 있는 광학 프로세서도 실험적으로 개발되었다.^[62]

4. 명명법

AI 가속기 분야는 2016년 기준으로 아직 발전 초기 단계에 있어, 명확한 용어 정의가 확립되지 않았다. 여러 업체들은 자사의 설계 및 API가 지배적인 설계가 되기를 바라며, "AI 가속기"에 해당하는 자체 마케팅 용어를 사용하고 있다.^[64] 이러한 장치들 사이의 경계나 정확한 형태에 대한 합의는 없지만, 여러 사례에서 이 새로운 영역을 채우는 것을 목표로 하고 있으며, 기능에 상당한 중복이 있다.

과거 소비자용 그래픽 가속기가 등장했을 때, 업계는 Direct3D가 제시한 모델을 구현하는 전반적인 그래픽 파이프라인에 정착하기 전까지 다양한 형태를 취했다. 결국 업계는 Nvidia가 자체적으로 지정한 용어인 "GPU"^[148]를 "그래픽 가속기"의 집합 명사로 채택했다.^[63]

5. 딥 러닝 프로세서 (DLP)

DianNao Family의 선구적인 연구에서 영감을 받아, 딥 러닝 프로세서(DLP)는 딥 뉴럴 네트워크의 특징을 활용하여 효율성을 극대화하도록 설계되어 학계와 산업계에서 많이 제안되었다. 2016년 ISCA에서 발표된 논문의 15%가 딥 러닝에 대한 아키텍처 설계에 중점을 두었을 정도이다.

DLP 개발 노력은 학계와 산업계 양쪽에서 활발하게 이루어졌다. 학계에서는 MIT의 Eyeriss^[65], 스탠포드 대학교의 EIE^[66], 하버드 대학교의 Minerva^[67], 토론토 대학교의 Stripes^[68] 등이 대표적이다. 산업계에서는 구글의 TPU^[69], 캠브리콘의 MLU^[70] 등이 DLP 개발을 주도했다.

다음 표는 대표적인 DLP 연구들을 정리한 것이다.

연도	DLP	기관	유형	계산	메모리 계층	제어	최대 성능
2014	DianNao^[21]	중국과학원(ICT, CAS)	디지털	벡터 MACs	스크래치패드	VLIW	452 Gops (16비트)
2014	DaDianNao^[22]	중국과학원(ICT, CAS)	디지털	벡터 MACs	스크래치패드	VLIW	5.58 Tops (16비트)
2015	ShiDianNao^[23]	중국과학원(ICT, CAS)	디지털	스칼라 MACs	스크래치패드	VLIW	194 Gops (16비트)
2015	PuDianNao^[24]	중국과학원(ICT, CAS)	디지털	벡터 MACs	스크래치패드	VLIW	1,056 Gops (16비트)
2016	DnnWeaver	조지아 공과대학교	디지털	벡터 MACs	스크래치패드	-	-
	EIE^[66]	스탠포드 대학교	디지털	스칼라 MACs	스크래치패드	-	102 Gops (16비트)
	Eyeriss^[65]	MIT	디지털	스칼라 MACs	스크래치패드	-	67.2 Gops (16비트)
	Prime^[71]	캘리포니아 대학교, 산타바바라(UCSB)	하이브리드	Process-in-Memory	ReRAM	-	-
2017	TPU^[69]	구글	디지털	스칼라 MACs	스크래치패드	CISC	92 Tops (8비트)
	PipeLayer^[75]	피츠버그 대학교	하이브리드	Process-in-Memory	ReRAM	-
	FlexFlow	중국과학원(ICT, CAS)	디지털	스칼라 MACs	스크래치패드	-	420 Gops
	DNPU^[72]	한국과학기술원(KAIST)	디지털	스칼라 MACS	스크래치패드	-	300 Gops(16비트)
2018	MAERI	조지아 공과대학교	디지털	스칼라 MACs	스크래치패드	-
	PermDNN	뉴욕 시립 대학교	디지털	벡터 MACs	스크래치패드	-	614.4 Gops (16비트)
	UNPU^[73]	한국과학기술원(KAIST)	디지털	스칼라 MACs	스크래치패드	-	345.6 Gops(16비트)
2019	FPSA	칭화 대학	하이브리드	Process-in-Memory	ReRAM	-
2019	Cambricon-F	중국과학원(ICT, CAS)	디지털	벡터 MACs	스크래치패드	FISA	14.9 Tops (F1, 16비트)

이종 컴퓨팅은 하나의 시스템 또는 칩에 특정 작업에 최적화된 여러 프로세서를 통합하는 것을 의미한다. Cell B.E. 마이크로프로세서^[102]는 팩된 저정밀 산술 연산 지원, 데이터 흐름 아키텍처, 레이턴시보다 처리량을 우선시하는 등 AI 가속기와 많은 특징을 공유한다. Cell 프로세서는 이후 AI를 포함한 다양한 작업에 응용되었다.^[103]^[104]^[105]^[106]^[107]^[108]

2000년대에는 CPU가 동영상 및 게임 워크로드가 증가함에 따라 SIMD 유닛의 데이터 폭을 점차 확장하고, 팩된 저정밀 데이터형을 지원하게 되었다.^[109]

2020년대에는 AI 엔진을 CPU 칩에 탑재하는 흐름이 나타나고 있다. 애플(Apple Inc.)의 A 시리즈 및 M 시리즈에 탑재된 Neural Engine^[110]^[111], AMD의 Ryzen AI^[112]^[113], 인텔(Intel)의 Meteor Lake 이후 통합된 Neural Processing Unit^[114]^[115] (NPU) 등이 그 예이다.

5. 1. 디지털 DLP

DLPs 아키텍처의 주요 구성 요소는 계산 구성 요소, 온칩 메모리 계층 구조, 그리고 데이터 통신 및 컴퓨팅 흐름을 관리하는 제어 로직을 포함한다.

딥 러닝의 대부분 연산은 벡터 연산으로 집계될 수 있으므로, 디지털 DLP에서 계산 구성 요소를 구축하는 가장 일반적인 방법은 MAC 기반 구성으로, 벡터 MAC^[21]^[22]^[24] 또는 스칼라 MAC을 사용한다.^[69]^[23]^[65] 이러한 MAC 기반 구성에서 딥 러닝 도메인별 병렬 처리가 더 잘 탐구된다. DLP는 일반적으로 상대적으로 큰 크기의 온칩 버퍼를 사용하지만, 메모리 대역폭 부담을 줄이기 위해 전용 온칩 데이터 재사용 및 데이터 교환 전략을 사용한다. 예를 들어, DianNao는 16개의 16-in 벡터 MAC을 사용하여 계산 구성 요소와 버퍼 간에 거의 1024GB/s의 대역폭 요구 사항을 갖는데, 온칩 재사용을 통해 이러한 대역폭 요구 사항이 크게 감소한다.^[21] DLP는 캐시 대신 스크래치패드 메모리를 사용하여 더 높은 데이터 재사용 기회를 제공한다.

DLP는 딥 러닝 도메인을 유연하게 지원하기 위해 전용 ISA(명령어 집합 아키텍처)를 활용한다. DianNao는 VLIW 스타일의 명령어 집합을 사용했고, Cambricon^[74]은 10개 이상의 서로 다른 딥 러닝 알고리즘을 지원할 수 있는 최초의 딥 러닝 도메인별 ISA를 도입했다. TPU 또한 CISC 스타일의 ISA에서 5개의 주요 명령어를 공개했다.

5. 2. 하이브리드 DLP

하이브리드 DLP는 DNN 추론 및 훈련 가속을 위해 높은 효율성을 제공한다. 메모리 내 처리(PIM) 아키텍처는 하이브리드 DLP의 가장 중요한 유형 중 하나이다. PIM의 핵심 설계 개념은 컴퓨팅 구성 요소를 메모리 셀, 컨트롤러 또는 메모리 칩으로 이동시켜 컴퓨팅과 메모리 간의 격차를 해소하는 것이다.^[75]^[76]^[77] 이러한 아키텍처는 데이터 경로를 단축하고 더 높은 내부 대역폭을 활용하여 성능을 향상시킨다. 또한, 계산 장치를 채택하여 고효율 DNN 엔진을 구축한다. 2013년, HP 랩은 ReRAM 크로스바 구조를 컴퓨팅에 채택하는 기능을 시연했다.^[78] 이 연구를 바탕으로 ReRAM,^[71]^[79]^[80]^[75] 상변화 메모리,^[76]^[81]^[82] 등을 기반으로 새로운 아키텍처 및 시스템 설계를 탐구하는 연구가 활발히 진행되고 있다.

2017년 6월, IBM의 연구자들은 이종 컴퓨팅과 대규모 병렬 처리 시스템에 일반화하기 위한 접근 방식을 목표로, 시간적 상관 관계 감지에 적용되는 인 메모리 컴퓨팅과 상변화 메모리 어레이를 기반으로 하는 폰노이만 구조와는 대조적인 아키텍처를 발표했다^[143]。2018년 10월, IBM의 연구자들은 인 메모리 처리를 기반으로, 인간 뇌의 시냅스 네트워크를 모델로 한 뉴로모픽 엔지니어링 아키텍처를 발표하여 심층 신경망을 가속화했다^[144]。이 시스템은 상변화 메모리 어레이를 기반으로 한다^[145]。

6. 벤치마크

AI 가속기의 성능을 평가하기 위해 MLPerf와 같은 벤치마크를 사용할 수 있다.^[83]

AI 가속기 벤치마크
연도	NN 벤치마크	소속	마이크로 벤치마크 수	구성 요소 벤치마크 수	애플리케이션 벤치마크 수
2012	BenchNN	중국과학원 (ICT, CAS)	N/A	12	N/A
2016	Fathom	하버드 대학교	N/A	8	N/A
2017	BenchIP	중국과학원 (ICT, CAS)	12	11	N/A
2017	DAWNBench	스탠퍼드 대학교	8	N/A	N/A
2017	DeepBench	바이두	4	N/A	N/A
2018	AI Benchmark	취리히 연방 공과대학교	N/A	26	N/A
2018	MLPerf	하버드 대학교, 인텔, 구글 등	N/A	7	N/A
2019	AIBench	중국과학원 (ICT, CAS) 및 알리바바 등	12	16	2
2019	NNBench-X	캘리포니아 대학교 샌타바버라	N/A	10	N/A

7. 잠재적 응용 분야

AI 가속기는 다음과 같은 다양한 분야에 응용될 수 있다.

산업용 로봇: 가변적인 상황에 대한 적응성을 높여 자동화할 수 있는 작업의 범위를 확장한다.
기계 번역
군사용 로봇
자연어 처리
검색 엔진: 데이터 센터의 에너지 효율을 높이고, 더욱 발전된 정보 검색 기능을 사용할 수 있게 한다.
무인 항공기: (예: 내비게이션 시스템), 모비디우스 마이리아드 2는 자율 드론 안내에 성공적으로 사용되었다.^[86]
음성 사용자 인터페이스: (예: 휴대폰), 퀄컴 제로스의 목표.^[87]
농업 로봇: 예를 들어 제초제를 사용하지 않고 잡초를 방제할 수 있다.^[84]
자율 주행 자동차: 엔비디아는 이 응용 분야를 위해 드라이브 PX 시리즈 보드를 목표로 하고 있다.^[85]
컴퓨터 보조 진단

참조

_[1] 웹사이트 Intel unveils Movidius Compute Stick USB AI Accelerator https://www.v3.co.uk[...] 2017-08-11
_[2] 웹사이트 Inspurs unveils GX4 AI Accelerator https://insidehpc.co[...] 2017-06-21
_[3] 간행물 Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors https://venturebeat.[...] 2020-03-14
_[4] 웹사이트 Google Designing AI Processors https://www.eetimes.[...] 2016-05-18
_[5] 웹사이트 Nvidia reveals new Hopper H100 GPU, with 80 billion transistors https://www.datacent[...] 2024-01-30
_[6] 웹사이트 HUAWEI Reveals the Future of Mobile AI at IFA https://consumer.hua[...]
_[7] 웹사이트 Intel's Lunar Lake Processors Arriving Q3 2024 https://www.intel.co[...]
_[8] 웹사이트 AMD XDNA Architecture https://www.amd.com/[...]
_[9] 웹사이트 Deploying Transformers on the Apple Neural Engine https://machinelearn[...] 2023-08-24
_[10] 학술지 In-Datacenter Performance Analysis of a Tensor Processing Unit 2017-06-24
_[11] 웹사이트 How silicon innovation became the 'secret sauce' behind AWS's success https://www.amazon.s[...] 2024-07-19
_[12] 웹사이트 Nvidia's New China AI Chips Circumvent US Restrictions https://www.semianal[...] 2024-02-07
_[13] 웹사이트 Inside Track https://archive.org/[...] PC Magazine 2023-12-26
_[14] Youtube convolutional neural network demo from 1993 featuring DSP32 accelerator https://www.youtube.[...] 2014-06-02
_[15] 학술지 Shift-invariant pattern recognition neural network and its optical architecture 1988
_[16] 학술지 Parallel distributed processing model with local space-invariant interconnections and its optical architecture 1990
_[17] 학술지 Designing a connectionist network supercomputer https://www.research[...] ResearchGate 2023-12-26
_[18] Youtube The end of general purpose computers (not) https://www.youtube.[...] 2015-04-17
_[19] 웹사이트 Space Efficient Neural Net Implementation https://www.research[...] 2023-12-26
_[20] 서적 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96
_[21] 학술지 DianNao 2014-04-05
_[22] 서적 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture IEEE 2014-12
_[23] 학술지 ShiDianNao 2016-01-04
_[24] 학술지 PuDianNao 2015-05-29
_[25] 학술지 DianNao family 2016-10-28
_[26] 웹사이트 Qualcomm Helps Make Your Mobile Devices Smarter With New Snapdragon Machine Learning Software Development Kit https://www.qualcomm[...]
_[27] 웹사이트 Qualcomm's Zeroth platform could make your smartphone much smarter https://www.cnet.com[...] 2021-09-28
_[28] 학술지 Synergistic Processing in Cell's Multicore Architecture
_[29] 학술지 Performance of Cell processor for biomolecular simulations
_[30] 서적 Video Processing and Retrieval on Cell architecture
_[31] 서적 2006 IEEE Symposium on Interactive Ray Tracing
_[32] 웹사이트 Development of an artificial neural network on a heterogeneous multicore architecture to predict a successful weight loss in obese individuals https://www.teco.edu[...] 2017-11-14
_[33] 서적 2008 5th IEEE Consumer Communications and Networking Conference
_[34] 서적 Euro-Par 2008 – Parallel Processing
_[35] 웹사이트 Improving the performance of video with AVX https://software.int[...] 2012-02-08
_[36] 웹사이트 High Performance Convolutional Neural Networks for Document Processing https://inria.hal.sc[...] 10th International Workshop on Frontiers in Handwriting Recognition 2006-10-23
_[37] 간행물 ImageNet Classification with Deep Convolutional Neural Networks 2017-05-24
_[38] 간행물 ImageNet classification with deep convolutional neural networks 2017-05-24
_[39] 웹사이트 Nvidia in the Driver's Seat for Deep Learning https://insidehpc.co[...] insideHPC 2023-05-17
_[40] 웹사이트 Nvidia announces 'supercomputer' for self-driving cars at CES 2016 https://www.theverge[...] Vox Media 2016-01-05
_[41] 문서 A Survey on Optimized Implementation of Deep Learning Models on the NVIDIA Jetson Platform https://www.research[...] 2019
_[42] 웹사이트 CUDA 9 Features Revealed: Volta, Cooperative Groups and More https://developer.nv[...] 2017-05-11
_[43] 웹사이트 Summit: Oak Ridge National Laboratory's 200 petaflop supercomputer https://www.olcf.orn[...] United States Department of Energy 2024
_[44] 서적 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) 2019-08
_[45] 웹사이트 FPGA Based Deep Learning Accelerators Take on ASICs http://www.nextplatf[...] 2016-08-23
_[46] 웹사이트 Microsoft unveils Project Brainwave for real-time AI https://www.microsof[...] 2017-08-22
_[47] 웹사이트 Google boosts machine learning with its Tensor Processing Unit https://techreport.c[...] 2016-05-19
_[48] 웹사이트 Chip could bring deep learning to mobile devices https://www.scienced[...] 2016-02-03
_[49] 웹사이트 Google Cloud announces the 5th generation of its custom TPUs https://techcrunch.c[...] 2023-08-29
_[50] 웹사이트 Deep Learning with Limited Numerical Precision http://proceedings.m[...]
_[51] arXiv XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
_[52] 웹사이트 Intel To Launch Spring Crest, Its First Neural Network Processor, In 2019 https://www.tomshard[...] 2018-05-23
_[53] 보고서 TensorFlow Distributions 2017-11-28
_[54] 웹사이트 Cerebras Hits the Accelerator for Deep Learning Workloads https://www.datanami[...] 2021-11-01
_[55] 웹사이트 Cerebras launches new AI supercomputing processor with 2.6 trillion transistors https://venturebeat.[...] 2021-04-20
_[56] 간행물 Temporal correlation detection using computational phase-change memory
_[57] 뉴스 A new brain-inspired architecture could improve how computers handle data and advance AI https://phys.org/new[...] 2018-10-03
_[58] 간행물 In-memory computing on a photonic platform
_[59] 간행물 Solving matrix equations in one step with cross-point resistive arrays
_[60] 간행물 Logic-in-memory based on an atomically thin semiconductor
_[61] 간행물 Parallel convolutional processing using an integrated photonic tensor
_[62] 웹사이트 Photonic Chips Curb AI Training's Energy Appetite - IEEE Spectrum https://spectrum.iee[...]
_[63] 웹사이트 NVIDIA launches the World's First Graphics Processing Unit, the GeForce 256 http://www.nvidia.co[...]
_[64] 웹사이트 Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips https://www.pcmag.co[...]
_[65] 간행물 Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks 2017
_[66] 서적 EIE: Efficient Inference Engine on Compressed Deep Neural Network 2016-02-03
_[67] 서적 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) IEEE 2016-06
_[68] 간행물 Stripes: Bit-Serial Deep Neural Network Computing 2017-01-01
_[69] 서적 In-Datacenter Performance Analysis of a Tensor Processing Unit Association for Computing Machinery 2017-06-24
_[70] 웹사이트 MLU 100 intelligence accelerator card https://www.cambrico[...] Cambricon 2024
_[71] 서적 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) IEEE 2016-06
_[72] 서적 2017 IEEE International Solid-State Circuits Conference (ISSCC) 2023-08-24
_[73] 서적 2018 IEEE International Solid - State Circuits Conference - (ISSCC) 2023-11-30
_[74] 서적 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) IEEE 2016-06
_[75] 서적 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) IEEE 2017-02
_[76] 간행물 Equivalent-accuracy accelerated neural-network training using analogue memory 2018-06
_[77] 서적 2017 IEEE International Electron Devices Meeting (IEDM) IEEE 2017-12
_[78] 간행물 Memristive devices for computing https://www.nature.c[...] 2013-01
_[79] 간행물 ISAAC 2016-10-12
_[80] 서적 FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture 2019-01-27
_[81] 서적 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS) IEEE 2019-11
_[82] 간행물 Accurate deep neural network inference using computational phase-change memory 2020-05-18
_[83] 웹사이트 Nvidia claims 'record performance' for Hopper MLPerf debut https://www.theregis[...]
_[84] 웹사이트 Development of a machine vision system for weed control using precision chemical application http://www.abe.ufl.e[...]
_[85] 웹사이트 Self-Driving Cars Technology & Solutions from NVIDIA Automotive https://www.nvidia.c[...]
_[86] 웹사이트 movidius powers worlds most intelligent drone https://www.siliconr[...] 2016-03-16
_[87] 웹사이트 Qualcomm Research brings server class machine learning to everyday devices–making them smarter [VIDEO] https://www.qualcomm[...] 2015-10
_[88] 뉴스 A Survey on Hardware Accelerators and Optimization Techniques for RNNs https://www.research[...] JSA 2020
_[89] 웹사이트 Intel unveils Movidius Compute Stick USB AI Accelerator https://www.v3.co.uk[...] 2017-08-11
_[90] 웹사이트 Inspurs unveils GX4 AI Accelerator https://insidehpc.co[...] 2020-07-23
_[91] 뉴스 Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors https://venturebeat.[...] 2020-03-14
_[92] 웹사이트 Google Developing AI Processors https://web.archive.[...] 2020-07-23
_[93] 문서 A Survey of ReRAM-based Architectures for Processing-in-memory and Neural Networks https://www.academia[...] S. Mittal, Machine Learning and Knowledge Extraction 2018
_[94] 웹사이트 13 Sextillion & Counting: The Long & Winding Road to the Most Frequently Manufactured Human Artifact in History https://www.computer[...] 2019-07-28
_[95] 웹사이트 convolutional neural network demo from 1993 featuring DSP32 accelerator https://www.youtube.[...] 2020-10-19
_[96] 웹사이트 design of a connectionist network supercomputer http://people.eecs.b[...] 2020-10-19
_[97] 웹사이트 The end of general purpose computers (not) https://www.youtube.[...] 2020-07-23
_[98] 서적 Proceedings of 9th International Parallel Processing Symposium
_[99] 웹사이트 Space Efficient Neural Net Implementation https://www.research[...] 2020-10-19
_[100] 서적 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96
_[101] 웹사이트 Application of the ANNA Neural Network Chip to High-Speed Character Recognition http://yann.lecun.co[...] 2020-10-19
_[102] 간행물 Synergistic Processing in Cell's Multicore Architecture
_[103] 간행물 Performance of Cell processor for biomolecular simulations
_[104] 서적 Video Processing and Retrieval on Cell architecture
_[105] 서적 2006 IEEE Symposium on Interactive Ray Tracing
_[106] 서적 2008 5th IEEE Consumer Communications and Networking Conference
_[107] 웹사이트 Development of an artificial neural network on a heterogeneous multicore architecture to predict a successful weight loss in obese individuals https://www.teco.edu[...] 2020-07-23
_[108] 서적 Euro-Par 2008 – Parallel Processing
_[109] 웹사이트 Improving the performance of video with AVX https://software.int[...] 2012-02-08
_[110] 웹사이트 【後藤弘茂のWeekly海外ニュース】 iPhone Xの深層学習コア「Neural Engine」の方向性 https://pc.watch.imp[...] 株式会社インプレス 2023-06-22
_[111] 웹사이트 アップルが開発した「ニューラルエンジン」は、人工知能でiPhoneに革新をもたらす https://wired.jp/201[...] 2023-06-22
_[112] 웹사이트 x86初のAIプロセッサ「Ryzen AI」は何がスゴイのかAMDが説明市場投入第1弾は「Razer Blade 14」 https://www.itmedia.[...] 2023-06-22
_[113] 웹사이트 Ryzen Pro 7000シリーズを発表、Ryzen AIはWindows 11で対応済み AMD CPUロードマップ (2/3) https://ascii.jp/ele[...] ASCII 2023-06-22
_[114] 웹사이트 Intel新ロードマップを発表。Meteor Lake、Arrow Lake、Lunar Lakeへと進化 https://pc.watch.imp[...] 株式会社インプレス 2023-06-22
_[115] 뉴스 IntelのMeteor Lake搭載ノート、dGPUなしでStable Diffusionを高速処理 - PC Watch https://pc.watch.imp[...]
_[116] 뉴스 用語集 | iSUS https://www.isus.jp/[...]
_[117] 웹사이트 microsoft research/pixel shaders/MNIST https://hal.inria.fr[...] 2020-10-19
_[118] 웹사이트 How GPU came to be used for general computation http://igoro.com/arc[...] 2020-10-19
_[119] 웹사이트 imagenet classification with deep convolutional neural networks https://papers.nips.[...] 2020-10-19
_[120] 웹사이트 nvidia introduces supercomputer for self driving cars https://web.archive.[...] 2016-01-06
_[121] 웹사이트 nvidia driving the development of deep learning http://insidehpc.com[...] 2016-05-17
_[122] 웹사이트 how nvlink will enable faster easier multi GPU computing https://devblogs.nvi[...] 2014-11-14
_[123] 웹사이트 CUDA 9 Features Revealed: Volta, Cooperative Groups and More https://devblogs.nvi[...] 2017-08-12
_[124] 뉴스 A Survey on Optimized Implementation of Deep Learning Models on the NVIDIA Jetson Platform https://www.research[...] 2019
_[125] 웹사이트 Space Efficient Neural Net Implementation https://www.research[...] 2020-07-23
_[126] 서적 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96
_[127] 웹사이트 FPGA Based Deep Learning Accelerators Take on ASICs http://www.nextplatf[...] 2016-09-07
_[128] 웹사이트 Project Brainwave https://www.microsof[...] 2020-06-16
_[129] 뉴스 A Survey of FPGA-based Accelerators for Convolutional Neural Networks https://www.academia[...] NCAA 2018
_[130] 웹사이트 Google boosts machine learning with its Tensor Processing Unit http://techreport.co[...] 2016-09-13
_[131] 웹사이트 Chip could bring deep learning to mobile devices https://www.scienced[...] 2016-09-13
_[132] 웹사이트 Deep Learning with Limited Numerical Precision http://jmlr.org/proc[...] 2020-07-23
_[133] arXiv XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
_[134] 웹사이트 Intel unveils Nervana Neural Net L-1000 for accelerated AI training https://venturebeat.[...] 2018-05-23
_[135] 웹사이트 Intel Lays Out New Roadmap for AI Portfolio https://www.top500.o[...] 2018-05-23
_[136] 웹사이트 Intel To Launch Spring Crest, Its First Neural Network Processor, In 2019 https://www.tomshard[...] 2018-05-23
_[137] 웹사이트 Available TensorFlow Ops {{!}} Cloud TPU {{!}} Google Cloud https://cloud.google[...] 2018-05-23
_[138] 웹사이트 ResNet-50 using BFloat16 on TPU https://github.com/t[...] 2018-05-23
_[139] 웹사이트 Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50 https://blog.riseml.[...] 2018-05-23
_[140] 보고서 TensorFlow Distributions 2017-11-28
_[141] 웹사이트 Facebook has a new job posting calling for chip designers https://social.techc[...] 2020-10-19
_[142] 웹사이트 Subscribe to read | Financial Times https://www.ft.com/c[...] 2020-10-19
_[143] 논문 Temporal correlation detection using computational phase-change memory
_[144] 뉴스 A new brain-inspired architecture could improve how computers handle data and advance AI https://phys.org/new[...] 2018-10-05
_[145] 간행물 In-memory computing on a photonic platform
_[146] 논문 Solving matrix equations in one step with cross-point resistive arrays
_[147] 논문 Logic-in-memory based on an atomically thin semiconductor
_[148] 웹사이트 NVIDIA launches the World's First Graphics Processing Unit, the GeForce 256 http://www.nvidia.co[...] 2020-10-19
_[149] 웹사이트 Self-Driving Cars Technology & Solutions from NVIDIA Automotive https://www.nvidia.c[...] 2020-10-19
_[150] 웹사이트 design of a machine vision system for weed control http://abe.ufl.edu/w[...] 2016-06-17
_[151] 웹사이트 qualcomm research brings server class machine learning to every data devices https://www.qualcomm[...] 2020-08-30
_[152] 웹사이트 movidius powers worlds most intelligent drone https://www.siliconr[...] 2020-08-30
_[153] 웹인용 Intel unveils Movidius Compute Stick USB AI Accelerator https://www.v3.co.uk[...] 2017-08-11
_[154] 웹인용 Inspurs unveils GX4 AI Accelerator https://insidehpc.co[...] 2017-06-21
_[155] 인용 Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors https://venturebeat.[...] 2020-03-14
_[156] 웹인용 Google Designing AI Processors https://www.eetimes.[...]
_[157] 웹인용 Nvidia reveals new Hopper H100 GPU, with 80 billion transistors https://www.datacent[...] 2024-01-30
_[158] 웹인용 Inside Track https://archive.org/[...] PC Magazine 2023-12-26
_[159] 웹인용 convolutional neural network demo from 1993 featuring DSP32 accelerator https://www.youtube.[...]
_[160] 저널 인용 Shift-invariant pattern recognition neural network and its optical architecture 1988
_[161] 저널 인용 Parallel distributed processing model with local space-invariant interconnections and its optical architecture 1990
_[162] 저널 인용 Designing a connectionist network supercomputer https://www.research[...] ResearchGate 2023-12-26
_[163] 웹인용 The end of general purpose computers (not) https://www.youtube.[...]
_[164] 웹인용 Space Efficient Neural Net Implementation https://www.research[...] 2023-12-26
_[165] 서적 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96
_[166] 웹인용 Qualcomm Helps Make Your Mobile Devices Smarter With New Snapdragon Machine Learning Software Development Kit https://www.qualcomm[...]
_[167] 웹인용 Qualcomm's Zeroth platform could make your smartphone much smarter https://www.cnet.com[...] 2021-09-28
_[168] 저널 인용 Synergistic Processing in Cell's Multicore Architecture
_[169] 저널 인용 Performance of Cell processor for biomolecular simulations
_[170] 서적 Video Processing and Retrieval on Cell architecture
_[171] 서적 2006 IEEE Symposium on Interactive Ray Tracing
_[172] 웹인용 Development of an artificial neural network on a heterogeneous multicore architecture to predict a successful weight loss in obese individuals https://web.archive.[...] 2017-11-14
_[173] 서적 2008 5th IEEE Consumer Communications and Networking Conference
_[174] 서적 Euro-Par 2008 – Parallel Processing
_[175] 웹인용 Improving the performance of video with AVX https://software.int[...] 2012-02-08
_[176] 웹인용 High Performance Convolutional Neural Networks for Document Processing https://inria.hal.sc[...] 10th International Workshop on Frontiers in Handwriting Recognition 2006-10-23
_[177] 저널 ImageNet Classification with Deep Convolutional Neural Networks 2017-05-24
_[178] 서적 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) 2019-08
_[179] 웹인용 FPGA Based Deep Learning Accelerators Take on ASICs http://www.nextplatf[...] 2016-08-23
_[180] 웹인용 Microsoft unveils Project Brainwave for real-time AI https://www.microsof[...] 2017-08-22
_[181] 웹인용 Google Cloud announces the 5th generation of its custom TPUs https://techcrunch.c[...] 2023-08-29
_[182] 웹인용 Google boosts machine learning with its Tensor Processing Unit https://techreport.c[...] 2016-05-19
_[183] 웹인용 Chip could bring deep learning to mobile devices https://www.scienced[...] 2016-02-03
_[184] 웹인용 Deep Learning with Limited Numerical Precision http://proceedings.m[...]
_[185] ArXiv XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
_[186] 웹인용 Intel unveils Nervana Neural Net L-1000 for accelerated AI training https://venturebeat.[...] 2018-05-23
_[187] 웹인용 Intel Lays Out New Roadmap for AI Portfolio https://www.top500.o[...] 2018-05-23
_[188] 웹인용 Intel To Launch Spring Crest, Its First Neural Network Processor, In 2019 https://www.tomshard[...] 2018-05-23
_[189] 웹인용 Available TensorFlow Ops {{!}} Cloud TPU {{!}} Google Cloud https://cloud.google[...]
_[190] 웹인용 Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50 https://blog.riseml.[...] 2018-04-26
_[191] 웹인용 ResNet-50 using BFloat16 on TPU https://github.com/t[...] 2018-02-28
_[192] 보고서 TensorFlow Distributions 2017-11-28
_[193] 웹인용 Google Reveals a Powerful New AI Chip and Supercomputer https://www.technolo[...]
_[194] 웹인용 What to Expect From Apple's Neural Engine in the A11 Bionic SoC – ExtremeTech https://www.extremet[...]
_[195] 웹인용 Facebook has a new job posting calling for chip designers https://social.techc[...] 2018-04-19
_[196] 뉴스 Facebook joins Amazon and Google in AI chip race https://www.ft.com/c[...] 2019-02-18
_[197] 웹인용 Samsung and AMD will reportedly take on Apple's M1 SoC later this year https://arstechnica.[...] 2021-05-11
_[198] 웹인용 The AI Race Expands: Qualcomm Reveals "Cloud AI 100" Family of Datacenter AI Inference Accelerators for 2020 https://www.anandtec[...]
_[199] 웹인용 Cerebras Hits the Accelerator for Deep Learning Workloads https://www.datanami[...] 2021-11-01
_[200] 웹인용 Cerebras launches new AI supercomputing processor with 2.6 trillion transistors https://venturebeat.[...] 2021-04-20